skip to main content


Search for: All records

Creators/Authors contains: "Xu, Sugang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. To accommodate the growing demand for cloud services, telecom carriers’ networks and datacenter (DC) facilities form large network–cloud ecosystems (ecosystems for short) physically supporting these services. These large-scale ecosystems are continuously evolving and must be highly resilient to support critical services. Open and disaggregated optical-networking technologies promise to enhance the interoperability across telecom carriers and DC operators, thanks to their open interfaces in both the data plane and control/management plane. In the first part of this paper, we focus on a single entity (e.g., a telecom carrier or an emerging telecom/DC partnership company) that owns both the network and DC infrastructures in the ecosystem. We introduce a solution by leveraging open and disaggregated technologies to enhance the resilience of the optical networks within a multi-vendor and multi-domain ecosystem. In the second part of this paper, we consider the case when the networks and DCs are owned by different entities. Also, in this case, cooperation among datacenter providers (DCPs) and carriers is crucial to provide failure/disaster resilience to today’s cloud services. However, such cooperation is more challenging since DCPs and carriers, being different entities, may not disclose confidential information, e.g., detailed resource availability. Hence, we introduce a solution to enhance the resilience of such multi-entity ecosystems through cooperation between DCPs and carriers without violating confidentiality.

     
    more » « less
  2. We investigate the problem of enhancing the resilience of future optical network-cloud ecosystems. We introduce new solutions to build disaster-resilient single-and multi-entity network-cloud ecosystems with openness, disaggregation, and cooperation between networks and clouds.

     
    more » « less
  3. Optical network failure management (ONFM) is a promising application of machine learning (ML) to optical networking. Typical ML-based ONFM approaches exploit historical monitored data, retrieved in a specific domain (e.g., a link or a network), to train supervised ML models and learn failure characteristics (a signature) that will be helpful upon future failure occurrence in that domain. Unfortunately, in operational networks, data availability often constitutes a practical limitation to the deployment of ML-based ONFM solutions, due to scarce availability of labeled data comprehensively modeling all possible failure types. One could purposely inject failures to collect training data, but this is time consuming and not desirable by operators. A possible solution is transfer learning (TL), i.e., training ML models on a source domain (SD), e.g., a laboratory testbed, and then deploying trained models on a target domain (TD), e.g., an operator network, possibly fine-tuning the learned models by re-training with few TD data. Moreover, in those cases when TL re-training is not successful (e.g., due to the intrinsic difference in SD and TD), another solution is domain adaptation, which consists of combining unlabeled SD and TD data before model training. We investigate domain adaptation and TL for failure detection and failure-cause identification across different lightpaths leveraging real optical SNR data. We find that for the considered scenarios, up to 20% points of accuracy increase can be obtained with domain adaptation for failure detection, while for failure-cause identification, only combining domain adaptation with model re-training provides significant benefit, reaching 4%–5% points of accuracy increase in the considered cases.

     
    more » « less
  4. null (Ed.)
  5. null (Ed.)
  6. Network connectivity, i.e., the reachability of any network node from all other nodes, is often considered as the default network survivability metric against failures. However, in the case of a large-scale disaster disconnecting multiple network components, network connectivity may not be achievable. On the other hand, with the shifting service paradigm towards the cloud in today’s networks, most services can still be provided as long as at least a content replica is available in all disconnected network partitions. As a result, the concept of content connectivity has been introduced as a new network survivability metric under a large-scale disaster. Content connectivity is defined as the reachability of content from every node in a network under a specific failure scenario. In this work, we investigate how to ensure content connectivity in optical metro networks. We derive necessary and sufficient conditions and develop what we believe to be a novel mathematical formulation to map a virtual network over a physical network such that content connectivity for the virtual network is ensured against multiple link failures in the physical network. In our numerical results, obtained under various network settings, we compare the performance of mapping with content connectivity and network connectivity and show that mapping with content connectivity can guarantee higher survivability, lower network bandwidth utilization, and significant improvement of service availability.

     
    more » « less